ftp.cs.arizona.edu

home *** CD-ROM | disk | FTP | other *** search

/ ftp.cs.arizona.edu / ftp.cs.arizona.edu.tar / ftp.cs.arizona.edu / icon / newsgrp / group98c.txt / 000020_icon-group-sender _Mon Sep 14 08:24:11 1998.msg < prev next >

Wrap

Internet Message Format | 2000-09-20 | 7KB

Return-Path: <icon-group-sender> Received: from kingfisher.CS.Arizona.EDU (kingfisher.CS.Arizona.EDU [192.12.69.239]) by baskerville.CS.Arizona.EDU (8.9.1a/8.9.1) with SMTP id IAA06348 for <icon-group-addresses@baskerville.CS.Arizona.EDU>; Mon, 14 Sep 1998 08:24:11 -0700 (MST) Received: by kingfisher.CS.Arizona.EDU (5.65v4.0/1.1.8.2/08Nov94-0446PM) id AA01625; Mon, 14 Sep 1998 08:23:43 -0700 From: gep2@computek.net Date: Sat, 12 Sep 1998 05:39:34 -0500 (CDT) Message-Id: <199809121039.FAA15753@mail.cmpu.net> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Subject: Re: Unicode support or support for non-Ascii based character manipulation? To: icon-group@optima.CS.Arizona.EDU X-Mailer: SPRY Mail Version: 04.00.06.17 Content-Transfer-Encoding: 7bit Content-Transfer-Encoding: 7bit Errors-To: icon-group-errors@optima.CS.Arizona.EDU Content-Transfer-Encoding: 7bit Status: RO > ASCII is also NOT adequate for many purposes even in the United States. Of course. Icon isn't the language of choice for writing device drivers, either. The point is not that it's not perfect FOR EVERYTHING, but that it's damned useful (and reasonably efficient) just as it is. If something has to be built ON TOP OF IT, then the person who needs those higher-level function has many of the tools needed to do that. You shouldn't force EVERYBODY to use the higher-level functions just because SOMEBODY, SOMEWHERE decides that they want them. > Almost every word processor has their own incompatible way of representing diacritical marks and characters that were omitted from ASCII. Sure. And there are all kinds of other details which are not readily represented by any simple character set customization, too (complex mathematical formulas, for example). I wouldn't insist on using TeX (or whatever) to produce my clients' invoices just because somebody needed that level of functionality for the particular stuff that THEY are doing. > (By the way, did you know that there are other countries in the Western Hemisphere besides the United States? And most of them don't speak English?) Don't be condescending. > I work in a library, and libraries found plain ASCII inadequate all the way back in the early 1960s, when the computer programmers were still bitching about people who wanted lowercase letters. It's already been pointed out in this thread that the 65536 unique characters allowed in Unicode aren't even universally agreed as being "enough" and there are already people jockeying to push for more. Personally, ASCII works plenty well enough that I am not eager to see (for example) my E-mail archives blossom to twice the disk space they already take up. I'm not eager to see Web pages take twice as long to download than they need to. One poster mentioned a compression scheme to allow them to be smaller than 2x, but that comes at added complexity (which isn't pretty either). >> If other countries have more difficult (or huge) character sets, > that is (while a fact of life) simply an inherent disadvantage > of their culture (and note that I'm not intending that as a slam > or value judgement, it just IS the way it is), and I don't see a > terribly convincing argument why the other countries (without > that disadvantage) ought to pay the price too, just in order to > artificially level the playing field. > Many of those non-Roman character sets are no more difficult than Roman. True enough, but supporting all of them simultanously is undeniably more difficult and complex (for example, mixing left-to-right and right-to-left languages on the same line is a major pain. So let's say for example that you're going to try to support English and Hebrew within a single string. Are you going to ask Icon's string scanning to scan the resulting mess in the "correct" way (i.e. left to right through the English part, then jumping to the right-end-beginning of the following Hebrew word, continuing to the left, then jumping across the Hebrew word to continue with the English word to its right? And are you going to ask programmers of high-level applications to design their programs so that these kinds of exceptions and special cases for exotic character sets all work within their applications? Frankly, I just don't think it's usually worth it. Sure, some people need such things. For them, there are special-purpose multilanguage word processors (Multi-Lingual Scholar, Nota Bene, and numerous others). And I still think Icon is as good or better than most other languages for writing applications that have to do weird stuff with character strings. But I don't really want to have to have all that paraphenalia around and imposing itself on me all the time. > The United States is not an island. Closing our eyes and pretending that rest of the world doesn't exist and doesn't buy our software would be a bad idea even if it was possible. That's fine, but we're also large enough that we can (and probably should) allow ourselves to benefit from the efficiency that our systems make possible (_where_ possible!) The companies I typically consult with certainly don't need Unicode to produce their accounting reports, nor for their invoicing or communications or other functions. It would be ridiculous to build those kind of back-office applications the same way (even using the same tools, probably) that one would build a program that had to handle sanskrit or Chinese or something. > If you're concerned about efficiency, maybe you should worry about all the gratuitous graphics. I talk about my attitude about such things at my Web site (see the FAQ). > Over uncompressed ASCII, compressed Unicode uses little to no more disk or tape space. Compressing and uncompressing strings adds some complexity, More than a little, I think. Frankly, I don't think that (for the great majority of users here) that added complexity comes with enough benefit to justify the hassle. > but you get some simplicity by not having to keep track of which character set you're in and switching back and forth between character sets within what is logically one string. Character sets are not by themselves enough (as I discuss above regarding character DIRECTION also changing within a string). Once you decide to support EVERYBODY's weird stuff in the basic system, you're starting down a very slippery slope, with the probability that you're STILL not going to please everybody... and that by trying, you end up pleasing NOBODY at all. Gordon Peterson http://www.computek.net/public/gep2/ Support the Anti-SPAM Amendment! Join at http://www.cauce.org/